In this exercise, I learnt how to visualise and analyse Time-Oriented data with R
packages = c('scales', 'viridis',
'lubridate', 'ggthemes',
'gridExtra', 'tidyverse',
'readxl', 'knitr',
'data.table', 'plotly')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
attacks <- read_csv("data/eventlog.csv")
For example, kable() can be used to review the structure of the imported data frame.
kable(head(attacks))
| timestamp | source_country | tz |
|---|---|---|
| 2015-03-12 15:59:16 | CN | Asia/Shanghai |
| 2015-03-12 16:00:48 | FR | Europe/Paris |
| 2015-03-12 16:02:26 | CN | Asia/Shanghai |
| 2015-03-12 16:02:38 | US | America/Chicago |
| 2015-03-12 16:03:22 | CN | Asia/Shanghai |
| 2015-03-12 16:03:45 | CN | Asia/Shanghai |
We need to convert the timezones in different countries to a reference time zone, as the time and day of attacks are based on the individual countries themselves.
The function below converts each time with the appropriate time zone, the time zone parameter, tz, only takes a single value, then extract its weekdays and hour.
We also need to convert the weekday and hour into factors so that they will be in an ordered form while plotting.
After data assigning orders to the time variable, we can now group the data. We will also use na.omit to remove any rows with NA values.
grouped <- attacks %>%
count(wkday, hour) %>%
ungroup()
grouped <- na.omit(grouped)
We can now use the ggplot function to plot a heatmap. Using x as the hour and y as the weekday, we can pass the dataframe through the function easily.
p1 <- ggplot(grouped,
aes(hour,
wkday,
fill = n)) +
geom_tile(color = "white",
size = 0.1) +
theme_tufte(base_family = "Helvetica") +
coord_equal() +
scale_fill_viridis(name = "# of Events",
label = comma) +
labs(x = NULL,
y = NULL,
title = "Events per day of week & time of day") +
theme(axis.ticks = element_blank(),
plot.title = element_text(hjust = 0.5),
legend.title = element_text(size = 8),
legend.text = element_text(size = 6) )
p1
After assigning it to a variable, we can apply the ggplotly function to make it interactive.
ggplotly(p1)
air <- read_excel("data/arrivals_by_air.xlsx")
We can extract the month and year data into separate columns from the Month-Year column
air$month <- factor(month(air$`Month-Year`),
levels=1:12,
labels=month.abb,
ordered=TRUE)
air$year <- year(ymd(air$`Month-Year`))
We can extract the country we one e.g. New Zealand using the code below.
New_Zealand <- air %>%
select(`New Zealand`,
month,
year) %>%
filter(year >= 2010)
Then, we can use the dplyr functions group_by and summarise to compute the average arrivals per month. This data will be used to plot the average arrivals for each month.
hline.data <- New_Zealand %>%
group_by(month) %>%
summarise(avgvalue = mean(`New Zealand`))
We can now pass through the ggplot function for the cycle plot. The geom_line function is used to plot the line graph for arrivals, while the geom_hline is used to plot the average line for each month.
p2 <- ggplot() +
geom_line(data=New_Zealand,
aes(x=year,
y=`New Zealand`,
group=month),
colour="black") +
geom_hline(aes(yintercept=avgvalue),
data=hline.data,
linetype=6,
colour="red",
size=0.5) +
facet_grid(~month) +
labs(axis.text.x = element_blank()) +
xlab("") +
ylab("No. of Visitors")
After assigning it to a variable, we can apply the ggplotly function to make it interactive.
ggplotly(p2)